Using the 2017-2018 NHANES data, which you can read more about in the tab below, our group hopes to explore potential relationships between participant mental health, physical activity habits, and demographic traits. There is a widespread assumption that increased physical activity has a positive correlation with improved mental health. Using statistical methods, we are intrigued to see if this is a valid assumption and what other drivers may alter mental health status.
Given this common inference, we hypothesize poor mental health will correlate with fewer minutes of vigorous activity, and more time spent sedentary.
These relationships are important to analyze as mental health concerns grow, especially in such difficult times. In 2017 and 2018, when these data were collected, nationwide stress was already mounting with increased natural disasters, growing political tensions, and other societal concerns. Since then, global stress levels have risen while the ability to exercise has decreased significantly in the midst of the pandemic. Perhaps, by analyzing these data from the pre-COVID era, we may recognize a correlation across physical activity, demographic group, and mental health that can be used to encourage change in people’s daily habits to improve mental health statuses holistically.
The National Health and Nutrition Examination Survey (NHANES) is a program designed to assess the health and nutritional status of adults and children in the US. They survey examines a nationally representative sample of about 5,000 people each year. It combines interviews and physical examinations. The surveys are crafted to focus on various health topics and demographic groups.
For our project, we chose the 2017-2018 wave of data. These data are recent, and contain a full year’s worth of un-interrupted information. Datasets can be found covering demographics, dietary data, examination data, laboratory data, and questionnaire data. From the Questionnaire Data tab, we decided to pull the Mental Health - Depression Screener dataset, as well as the Physical Activity dataset. The Questionnaire data seemed to have the most broad and available data. From the Demographics Data tab we used the only available dataset.
The 2017-2018 Demographic dataset includes 46 variables. Of these, we plan to use “gender” and “age.” These data were collected at a screening. Gender is given as “male” or “female” in binary form where 1 represents male and 0 represents female. Age is given in years of the participant at the time of screening. Individuals 80 and over are categorized as 80+ years of age. The others are given as integers.
The 2017-2018 Physical Activity dataset contains 17 quantitative variables. Of these, we plan to use the variables “Minutes of Vigorous Recreational Activities,” and “Minutes of Sedentary Activity.” These are the variables PAD660, PAD680 from the dataset. We will use forcats to re-code each level of activity into “low”, “moderate”, and “intense” and will similarly recode time spent sedentary into “low,” “moderate,” and "intense.
The 2017-2018 Mental Health dataset has 11 variables, all answered categorically with multiple choice options. From these, we plan to use the following variables: “Feeling down, depressed, or hopeless,” “Trouble sleeping or sleeping too much,” Feeling bad about yourself," and “Trouble concentrating on things.” These variables are all within the context of occurring within 2 weeks prior to taking the survey. In the dataset they are denoted by DPQ020, DPQ030, DPQ060, DPQ070. A scale is provided for each with 0 representing “Not at all,” 1 representing “Several days,” 2 representing “More than half the days,” and 3 representing “Nearly every day.” 7 and 9 represented “Refuse to answer” and “Don’t know,” respectively. These responses were removed from our final dataset.
To clean these data, we first selected the aforementioned variables of interest and removed the following entries from each data set: “Refuse to answer,” “Don’t know,” “NA”, and “missing.”
Using dplyr and full_join(), we first joined two datasets by SEQN, a common variable that identified each participant across the datasets, and created a new dataset. To that, we used full_join() again with SEQN as the common variable to combine all 3 datasets into one. From here, we then removed any resulting “NA” entries. This resulted in a new set of 1,271 observations.
# Selecting datasets
phys <- read.xport("PAQ_J.XPT") #Physical Activity
mental <- read.xport("DPQ_J.XPT") # Mental Health
demo <- read.xport("DEMO_J (1).XPT") # Demographic Data
# select variables of interest. remove instances of "NA", dont know", "refuse to answer", and "missing.
phys1 <- phys %>% select(PAD660, PAD680, SEQN) %>% na.omit() %>% filter(!PAD660 %in% c("7777", "9999", ".") & !PAD680 %in% c("7777", "9999", ".") )
mental1 <- mental %>% select(DPQ020, DPQ030, DPQ060, DPQ070, SEQN) %>% na.omit() %>% filter(!DPQ020 %in% c("7", "9", "."), !DPQ030 %in% c("7", "9", "."), !DPQ060 %in% c("7", "9", "."), !DPQ070 %in% c("7", "9", "."))
demo1 <- demo %>% select(RIAGENDR, RIDAGEYR, RIDRETH3, SEQN) %>% na.omit() # selecting gender and age
# Joining datasets and dropping resulting NAs using dplyr
dat1 <- full_join(phys1, mental1)
dat2 <- full_join(dat1, demo1)
dat2 <- dat2 %>% na.omit()
From here, we re-named the variables to make them easier to keep up with throughout the project.
We then used as.factor() to turn some variables into factor variables for easier manipulation later on.
We then created new variables in the dataset for all mental health variables as well as gender in order to re-code the responses from a numerical scale to an easier-to-understand scale. We decided to keep these as separate variables within the dataset, though they may seem to be duplicates, (depr2, sleep2, feelBad2, concen2, and gender2) in case using the numerical scales proved to be more helpful at another point during the project.
Using the physical activity variables, we similarly created 2 more variables to include in the dataset for which minsPA and minsSed were grouped into 3 levels. We used cut() in order to take a numerical rnage and split it into 3 levels: “low”, “moderate”, and “high.” This will be helpful for visualizing clear mental health differences between each group.
We then turned our new dataframe into a csv file to share easily between group members.
# Data cleaning
# renaming variables in dataset so they are easier to understand------------------------------------
dat2 <- dat2 %>%
rename(
minsPA = PAD660,
minsSed = PAD680,
IDNum = SEQN,
depr = DPQ020,
sleep = DPQ030,
feelBad = DPQ060,
concen = DPQ070,
gender = RIAGENDR,
age = RIDAGEYR,
race = RIDRETH3
)
# making ints into categorical variables as needed - depression screening questionnaire and demographic data
dat2$depr <- as.factor(dat2$depr)
dat2$sleep <- as.factor(dat2$sleep)
dat2$feelBad <- as.factor(dat2$feelBad)
dat2$concen <- as.factor(dat2$concen)
dat2$gender <- as.factor(dat2$gender)
# re-coding the responses using forcats ---------------------------------------------------------------------------
dat2 <- dat2 %>%
mutate(depr2 = fct_recode(depr,
"Not at all" = "0",
"Several Days" = "1",
"More than half the days" = "2",
"Nearly every day" = "3"),
sleep2 = fct_recode(sleep,
"Not at all" = "0",
"Several Days" = "1",
"More than half the days" = "2",
"Nearly every day" = "3"),
feelBad2 = fct_recode(feelBad,
"Not at all" = "0",
"Several Days" = "1",
"More than half the days" = "2",
"Nearly every day" = "3"),
concen2 = fct_recode(concen,
"Not at all" = "0",
"Several Days" = "1",
"More than half the days" = "2",
"Nearly every day" = "3"),
gender2 = fct_recode(gender,
"Male" = "1",
"Female" = "0")
)
# Making Mental Health vars binary for regression analysis and saving as var3
dat2 <- dat2 %>%
mutate(depr3 = fct_recode(depr,
"0" = "0",
"1"= "1",
"1" = "2",
"1" = "3"),
sleep3 = fct_recode(sleep,
"0" = "0",
"1"= "1",
"1" = "2",
"1" = "3"),
feelBad3 = fct_recode(feelBad,
"0" = "0",
"1"= "1",
"1" = "2",
"1" = "3"),
concen3 = fct_recode(concen,
"0" = "0",
"1"= "1",
"1" = "2",
"1" = "3"))
dat2$depr3 <- as.factor(dat2$depr3)
dat2$sleep3 <- as.factor(dat2$sleep3)
dat2$feelBad3 <- as.factor(dat2$feelBad3)
dat2$concen3 <- as.factor(dat2$concen3)
# categorize phys activity data----------------------------------------------------------------------------------
dat2 <- data.frame(dat2)
dat2$minsPAlevel = cut(dat2$minsPA, c(0,75,250, 480), labels = c("low", "moderate", "intense"))
dat2$minsSedlevel = cut(dat2$minsSed, c(0,380,760, 1140), labels = c("low", "moderate", "intense"))
dat2$agelevel = cut(dat2$age, c(17,35,60,80), labels = c("young", "middle aged", "old"))
# turn into csv file for other students to use
write.csv(dat2,"~/Desktop/Senior Yr/QTM 151\\projectdata.csv", row.names = FALSE)
Here we will take a look at our data categorically. Navigate using the tabs below.
#First three graph, of minutes of activity versus sleep2, depr2, feelBad2
graph1 <- ggplot(dat2, aes(x=depr2, group= minsSedlevel)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..), stat = "count")) +
geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", vjust= -.5)+
facet_wrap(~minsSedlevel)+
labs(x="Level of depression", y="Proportion", title="Figure 1: Depression levels seperated by level of sedentary activity ", fill= "depr2") +
scale_fill_brewer(name = 'Levels', breaks = 1:4,
labels = levels(dat2$depr2), palette = 'Set2')+
scale_y_continuous(labels = scales::percent_format())+
theme_minimal()+
theme(axis.text.x = element_blank(), plot.title = element_text(vjust = 1, size=12))
graph1
data1 <-ggplot_build(graph1)
head(data1$data)
## [[1]]
## fill y count prop x stat group PANEL ymin ymax
## 1 #66C2A5 0.82113821 707 0.82113821 1 count 1 1 0 0.82113821
## 2 #FC8D62 0.12427410 107 0.12427410 2 count 1 1 0 0.12427410
## 3 #8DA0CB 0.03135889 27 0.03135889 3 count 1 1 0 0.03135889
## 4 #E78AC3 0.02322880 20 0.02322880 4 count 1 1 0 0.02322880
## 5 #66C2A5 0.81250000 312 0.81250000 1 count 2 2 0 0.81250000
## 6 #FC8D62 0.15885417 61 0.15885417 2 count 2 2 0 0.15885417
## 7 #8DA0CB 0.01562500 6 0.01562500 3 count 2 2 0 0.01562500
## 8 #E78AC3 0.01302083 5 0.01302083 4 count 2 2 0 0.01302083
## 9 #66C2A5 0.88461538 23 0.88461538 1 count 3 3 0 0.88461538
## 10 #FC8D62 0.03846154 1 0.03846154 2 count 3 3 0 0.03846154
## 11 #8DA0CB 0.03846154 1 0.03846154 3 count 3 3 0 0.03846154
## 12 #E78AC3 0.03846154 1 0.03846154 4 count 3 3 0 0.03846154
## xmin xmax colour size linetype alpha
## 1 0.55 1.45 NA 0.5 1 NA
## 2 1.55 2.45 NA 0.5 1 NA
## 3 2.55 3.45 NA 0.5 1 NA
## 4 3.55 4.45 NA 0.5 1 NA
## 5 0.55 1.45 NA 0.5 1 NA
## 6 1.55 2.45 NA 0.5 1 NA
## 7 2.55 3.45 NA 0.5 1 NA
## 8 3.55 4.45 NA 0.5 1 NA
## 9 0.55 1.45 NA 0.5 1 NA
## 10 1.55 2.45 NA 0.5 1 NA
## 11 2.55 3.45 NA 0.5 1 NA
## 12 3.55 4.45 NA 0.5 1 NA
##
## [[2]]
## y label count prop x width group PANEL colour size angle
## 1 0.82113821 82.1% 707 0.82113821 1 0.9 1 1 black 3.88 0
## 2 0.12427410 12.4% 107 0.12427410 2 0.9 1 1 black 3.88 0
## 3 0.03135889 3.1% 27 0.03135889 3 0.9 1 1 black 3.88 0
## 4 0.02322880 2.3% 20 0.02322880 4 0.9 1 1 black 3.88 0
## 5 0.81250000 81.2% 312 0.81250000 1 0.9 2 2 black 3.88 0
## 6 0.15885417 15.9% 61 0.15885417 2 0.9 2 2 black 3.88 0
## 7 0.01562500 1.6% 6 0.01562500 3 0.9 2 2 black 3.88 0
## 8 0.01302083 1.3% 5 0.01302083 4 0.9 2 2 black 3.88 0
## 9 0.88461538 88.5% 23 0.88461538 1 0.9 3 3 black 3.88 0
## 10 0.03846154 3.8% 1 0.03846154 2 0.9 3 3 black 3.88 0
## 11 0.03846154 3.8% 1 0.03846154 3 0.9 3 3 black 3.88 0
## 12 0.03846154 3.8% 1 0.03846154 4 0.9 3 3 black 3.88 0
## hjust vjust alpha family fontface lineheight
## 1 0.5 -0.5 NA 1 1.2
## 2 0.5 -0.5 NA 1 1.2
## 3 0.5 -0.5 NA 1 1.2
## 4 0.5 -0.5 NA 1 1.2
## 5 0.5 -0.5 NA 1 1.2
## 6 0.5 -0.5 NA 1 1.2
## 7 0.5 -0.5 NA 1 1.2
## 8 0.5 -0.5 NA 1 1.2
## 9 0.5 -0.5 NA 1 1.2
## 10 0.5 -0.5 NA 1 1.2
## 11 0.5 -0.5 NA 1 1.2
## 12 0.5 -0.5 NA 1 1.2
graph2 <- ggplot(dat2, aes(x=depr2, group= minsPAlevel)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..), stat = "count")) +
geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", vjust= -.5)+
facet_wrap(~minsPAlevel)+
labs(x="Level of depression", y="Proportion", title="Figure 1: Depression levels seperated by level of sedentary activity ", fill= "depr2") +
scale_fill_brewer(name = 'Levels', breaks = 1:4,
labels = levels(dat2$depr2), palette = 'Set2')+
scale_y_continuous(labels = scales::percent_format())+
theme_minimal()+
theme(axis.text.x = element_blank(), plot.title = element_text(vjust = 1, size=12))
graph2
data2 <-ggplot_build(graph2)
head(data2$data)
## [[1]]
## fill y count prop x stat group PANEL ymin ymax
## 1 #66C2A5 0.82692308 688 0.82692308 1 count 1 1 0 0.82692308
## 2 #FC8D62 0.12740385 106 0.12740385 2 count 1 1 0 0.12740385
## 3 #8DA0CB 0.02884615 24 0.02884615 3 count 1 1 0 0.02884615
## 4 #E78AC3 0.01682692 14 0.01682692 4 count 1 1 0 0.01682692
## 5 #66C2A5 0.80510441 347 0.80510441 1 count 2 2 0 0.80510441
## 6 #FC8D62 0.14385151 62 0.14385151 2 count 2 2 0 0.14385151
## 7 #8DA0CB 0.02320186 10 0.02320186 3 count 2 2 0 0.02320186
## 8 #E78AC3 0.02784223 12 0.02784223 4 count 2 2 0 0.02784223
## 9 #66C2A5 0.87500000 7 0.87500000 1 count 3 3 0 0.87500000
## 10 #FC8D62 0.12500000 1 0.12500000 2 count 3 3 0 0.12500000
## xmin xmax colour size linetype alpha
## 1 0.55 1.45 NA 0.5 1 NA
## 2 1.55 2.45 NA 0.5 1 NA
## 3 2.55 3.45 NA 0.5 1 NA
## 4 3.55 4.45 NA 0.5 1 NA
## 5 0.55 1.45 NA 0.5 1 NA
## 6 1.55 2.45 NA 0.5 1 NA
## 7 2.55 3.45 NA 0.5 1 NA
## 8 3.55 4.45 NA 0.5 1 NA
## 9 0.55 1.45 NA 0.5 1 NA
## 10 1.55 2.45 NA 0.5 1 NA
##
## [[2]]
## y label count prop x width group PANEL colour size angle
## 1 0.82692308 82.7% 688 0.82692308 1 0.9 1 1 black 3.88 0
## 2 0.12740385 12.7% 106 0.12740385 2 0.9 1 1 black 3.88 0
## 3 0.02884615 2.9% 24 0.02884615 3 0.9 1 1 black 3.88 0
## 4 0.01682692 1.7% 14 0.01682692 4 0.9 1 1 black 3.88 0
## 5 0.80510441 80.5% 347 0.80510441 1 0.9 2 2 black 3.88 0
## 6 0.14385151 14.4% 62 0.14385151 2 0.9 2 2 black 3.88 0
## 7 0.02320186 2.3% 10 0.02320186 3 0.9 2 2 black 3.88 0
## 8 0.02784223 2.8% 12 0.02784223 4 0.9 2 2 black 3.88 0
## 9 0.87500000 87.5% 7 0.87500000 1 0.9 3 3 black 3.88 0
## 10 0.12500000 12.5% 1 0.12500000 2 0.9 3 3 black 3.88 0
## hjust vjust alpha family fontface lineheight
## 1 0.5 -0.5 NA 1 1.2
## 2 0.5 -0.5 NA 1 1.2
## 3 0.5 -0.5 NA 1 1.2
## 4 0.5 -0.5 NA 1 1.2
## 5 0.5 -0.5 NA 1 1.2
## 6 0.5 -0.5 NA 1 1.2
## 7 0.5 -0.5 NA 1 1.2
## 8 0.5 -0.5 NA 1 1.2
## 9 0.5 -0.5 NA 1 1.2
## 10 0.5 -0.5 NA 1 1.2
Figure 1 seems as though it does not corroborate our hypothesis. From Figure 1, people with intense levels of sedentary activity appear to have a similar likelihood to feel depressed as those with little sedentary activity. In fact, 88.46% of intense sedentary people felt no depression compared to 82.11% and 81.25% for the other levels. However, those with intense levels of physical activity seem to generally have lower levels of depression, (87.5% no depression versus 80.51% and 82.69% for moderate and low) which could imply the relationship is only one way, such that higher levels of physical activity lower the risk of depression, but the level of sedentary activity does not factor in as much.
graph3 <- ggplot(dat2, aes(x=sleep2, group= minsSedlevel)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..), stat = "count")) +
geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", vjust= -.5) +
facet_wrap(~minsSedlevel) +
labs(x="Quality of Sleep",y="Proportion", title="Figure 3: Quality of sleep seperated by level of sedentary activity ", fill= "sleep2") +
scale_fill_brewer(name = 'Levels', breaks = 1:4,
labels = levels(dat2$sleep2), palette = 'Set2') +
scale_y_continuous(labels = scales::percent_format()) +
theme_minimal() +
theme(axis.text.x = element_blank(), plot.title = element_text(vjust = 1, size=12))
graph3
data3 <-ggplot_build(graph3)
head(data3$data)
## [[1]]
## fill y count prop x stat group PANEL ymin ymax
## 1 #66C2A5 0.66202091 570 0.66202091 1 count 1 1 0 0.66202091
## 2 #FC8D62 0.21254355 183 0.21254355 2 count 1 1 0 0.21254355
## 3 #8DA0CB 0.07200929 62 0.07200929 3 count 1 1 0 0.07200929
## 4 #E78AC3 0.05342625 46 0.05342625 4 count 1 1 0 0.05342625
## 5 #66C2A5 0.64583333 248 0.64583333 1 count 2 2 0 0.64583333
## 6 #FC8D62 0.23958333 92 0.23958333 2 count 2 2 0 0.23958333
## 7 #8DA0CB 0.05989583 23 0.05989583 3 count 2 2 0 0.05989583
## 8 #E78AC3 0.05468750 21 0.05468750 4 count 2 2 0 0.05468750
## 9 #66C2A5 0.50000000 13 0.50000000 1 count 3 3 0 0.50000000
## 10 #FC8D62 0.34615385 9 0.34615385 2 count 3 3 0 0.34615385
## 11 #8DA0CB 0.11538462 3 0.11538462 3 count 3 3 0 0.11538462
## 12 #E78AC3 0.03846154 1 0.03846154 4 count 3 3 0 0.03846154
## xmin xmax colour size linetype alpha
## 1 0.55 1.45 NA 0.5 1 NA
## 2 1.55 2.45 NA 0.5 1 NA
## 3 2.55 3.45 NA 0.5 1 NA
## 4 3.55 4.45 NA 0.5 1 NA
## 5 0.55 1.45 NA 0.5 1 NA
## 6 1.55 2.45 NA 0.5 1 NA
## 7 2.55 3.45 NA 0.5 1 NA
## 8 3.55 4.45 NA 0.5 1 NA
## 9 0.55 1.45 NA 0.5 1 NA
## 10 1.55 2.45 NA 0.5 1 NA
## 11 2.55 3.45 NA 0.5 1 NA
## 12 3.55 4.45 NA 0.5 1 NA
##
## [[2]]
## y label count prop x width group PANEL colour size angle
## 1 0.66202091 66.2% 570 0.66202091 1 0.9 1 1 black 3.88 0
## 2 0.21254355 21.3% 183 0.21254355 2 0.9 1 1 black 3.88 0
## 3 0.07200929 7.2% 62 0.07200929 3 0.9 1 1 black 3.88 0
## 4 0.05342625 5.3% 46 0.05342625 4 0.9 1 1 black 3.88 0
## 5 0.64583333 64.6% 248 0.64583333 1 0.9 2 2 black 3.88 0
## 6 0.23958333 24.0% 92 0.23958333 2 0.9 2 2 black 3.88 0
## 7 0.05989583 6.0% 23 0.05989583 3 0.9 2 2 black 3.88 0
## 8 0.05468750 5.5% 21 0.05468750 4 0.9 2 2 black 3.88 0
## 9 0.50000000 50.0% 13 0.50000000 1 0.9 3 3 black 3.88 0
## 10 0.34615385 34.6% 9 0.34615385 2 0.9 3 3 black 3.88 0
## 11 0.11538462 11.5% 3 0.11538462 3 0.9 3 3 black 3.88 0
## 12 0.03846154 3.8% 1 0.03846154 4 0.9 3 3 black 3.88 0
## hjust vjust alpha family fontface lineheight
## 1 0.5 -0.5 NA 1 1.2
## 2 0.5 -0.5 NA 1 1.2
## 3 0.5 -0.5 NA 1 1.2
## 4 0.5 -0.5 NA 1 1.2
## 5 0.5 -0.5 NA 1 1.2
## 6 0.5 -0.5 NA 1 1.2
## 7 0.5 -0.5 NA 1 1.2
## 8 0.5 -0.5 NA 1 1.2
## 9 0.5 -0.5 NA 1 1.2
## 10 0.5 -0.5 NA 1 1.2
## 11 0.5 -0.5 NA 1 1.2
## 12 0.5 -0.5 NA 1 1.2
graph4 <- ggplot(dat2, aes(x=sleep2, group= minsPAlevel)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..), stat = "count")) +
geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", vjust= -.5) +
facet_wrap(~minsPAlevel) +
labs(x="Quality of Sleep",y="Proportion", title="Figure 4: Quality of sleep seperated by level of physical activity", fill= "sleep2") +
scale_fill_brewer(name = 'Levels', breaks = 1:4,
labels = levels(dat2$sleep2), palette = 'Set2') +
scale_y_continuous(labels = scales::percent_format()) +
theme_minimal() +
theme(axis.text.x = element_blank(), plot.title = element_text(vjust = 1, size=12))
graph4
data4 <-ggplot_build(graph4)
head(data4$data)
## [[1]]
## fill y count prop x stat group PANEL ymin ymax
## 1 #66C2A5 0.65625000 546 0.65625000 1 count 1 1 0 0.65625000
## 2 #FC8D62 0.22475962 187 0.22475962 2 count 1 1 0 0.22475962
## 3 #8DA0CB 0.06490385 54 0.06490385 3 count 1 1 0 0.06490385
## 4 #E78AC3 0.05408654 45 0.05408654 4 count 1 1 0 0.05408654
## 5 #66C2A5 0.64965197 280 0.64965197 1 count 2 2 0 0.64965197
## 6 #FC8D62 0.22041763 95 0.22041763 2 count 2 2 0 0.22041763
## 7 #8DA0CB 0.07656613 33 0.07656613 3 count 2 2 0 0.07656613
## 8 #E78AC3 0.05336427 23 0.05336427 4 count 2 2 0 0.05336427
## 9 #66C2A5 0.62500000 5 0.62500000 1 count 3 3 0 0.62500000
## 10 #FC8D62 0.25000000 2 0.25000000 2 count 3 3 0 0.25000000
## 11 #8DA0CB 0.12500000 1 0.12500000 3 count 3 3 0 0.12500000
## xmin xmax colour size linetype alpha
## 1 0.55 1.45 NA 0.5 1 NA
## 2 1.55 2.45 NA 0.5 1 NA
## 3 2.55 3.45 NA 0.5 1 NA
## 4 3.55 4.45 NA 0.5 1 NA
## 5 0.55 1.45 NA 0.5 1 NA
## 6 1.55 2.45 NA 0.5 1 NA
## 7 2.55 3.45 NA 0.5 1 NA
## 8 3.55 4.45 NA 0.5 1 NA
## 9 0.55 1.45 NA 0.5 1 NA
## 10 1.55 2.45 NA 0.5 1 NA
## 11 2.55 3.45 NA 0.5 1 NA
##
## [[2]]
## y label count prop x width group PANEL colour size angle
## 1 0.65625000 65.6% 546 0.65625000 1 0.9 1 1 black 3.88 0
## 2 0.22475962 22.5% 187 0.22475962 2 0.9 1 1 black 3.88 0
## 3 0.06490385 6.5% 54 0.06490385 3 0.9 1 1 black 3.88 0
## 4 0.05408654 5.4% 45 0.05408654 4 0.9 1 1 black 3.88 0
## 5 0.64965197 65.0% 280 0.64965197 1 0.9 2 2 black 3.88 0
## 6 0.22041763 22.0% 95 0.22041763 2 0.9 2 2 black 3.88 0
## 7 0.07656613 7.7% 33 0.07656613 3 0.9 2 2 black 3.88 0
## 8 0.05336427 5.3% 23 0.05336427 4 0.9 2 2 black 3.88 0
## 9 0.62500000 62.5% 5 0.62500000 1 0.9 3 3 black 3.88 0
## 10 0.25000000 25.0% 2 0.25000000 2 0.9 3 3 black 3.88 0
## 11 0.12500000 12.5% 1 0.12500000 3 0.9 3 3 black 3.88 0
## hjust vjust alpha family fontface lineheight
## 1 0.5 -0.5 NA 1 1.2
## 2 0.5 -0.5 NA 1 1.2
## 3 0.5 -0.5 NA 1 1.2
## 4 0.5 -0.5 NA 1 1.2
## 5 0.5 -0.5 NA 1 1.2
## 6 0.5 -0.5 NA 1 1.2
## 7 0.5 -0.5 NA 1 1.2
## 8 0.5 -0.5 NA 1 1.2
## 9 0.5 -0.5 NA 1 1.2
## 10 0.5 -0.5 NA 1 1.2
## 11 0.5 -0.5 NA 1 1.2
The graphs for sleep have a similar story as the graphs for feelings of depression. In Figure 3, people with intense sedentary activity levels appear to get a similar quality of sleep as those with low sedentary activity. Figure 4 shows physical activity has some effect on quality of sleep but it appears to be statistically insignificant. Those with intense physical activity almost have an equal likelihood of falling into each category of quality of sleep. However, the number of people who report intense physical activity and their sleep quality is so minimal that it is difficult to tell.
#Original graphs factored by Sex
graph5 <- ggplot(data=dat2, aes(x=depr2, fill = factor(gender))) + geom_bar(position="dodge") + facet_wrap(~minsSedlevel)+labs(x="Frequency of feeling depressed", y="Count", title="Figure 5: Frequency of feeling depressed separated \n by level of sedentary activity factored by sex")+ theme_minimal()+
theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph5
graph6 <-ggplot(data=dat2, aes(x=depr2, fill = gender)) + geom_bar(position="dodge") + facet_wrap(~minsPAlevel)+labs(x="Frequency of feeling depressed", y="Count", title="Figure 6: Frequency of feeling depressed separated \n by level of physical activity factored by sex")+
theme_minimal()+
theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph6
These charts are very interesting. While they seem to support the conclusions drawn above, it seems as though the effect may be higher for females than males. In Figure 5, there is a large gender difference for those with low and moderate sedentary activity. However, participants with intense physical activity have similar levels of depression regardless of gender. In Figure 6, Females with low levels of physical activity have a higher rate of depression than males. Still, that relationship switches when looking at moderate and intense levels of physical activity.
graph7 <- ggplot(data=dat2, aes(x=sleep2, fill = factor(gender))) + geom_bar(position="dodge") + facet_wrap(~minsSedlevel)+labs(x="Quality of Sleep", y="Count", title="Figure 7: Quality of sleep separated \n by level of sedentary activity factored by sex")+
theme_minimal()+
theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph7
graph8 <- ggplot(data=dat2, aes(x=sleep2, fill = factor(gender))) + geom_bar(position="dodge") + facet_wrap(~minsPAlevel)+labs(x="Quality of Sleep", y="Count", title="Figure 8: Quality of sleep separated \n by level of physical activity factored by sex ")+
theme_minimal()+
theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph8
In Figure 7, the gender difference is not as varied as it was in Figure 5. The more sedentary activity, the less of a role gender plays in the quality of sleep that participants get. The most variation exists among those with low sedentary activity where females experience better quality sleep than males. In Figure 8, the only noticeable pattern is that participants with moderate and intense physical activity.
#Original Graphs factored by age
graph9 <- ggplot(data=dat2, aes(x=depr2, fill = factor(agelevel))) + geom_bar(position="dodge") + facet_wrap(~minsSedlevel)+labs(x="Frequency of feeling depressed", y="Count", title="Figure 9: Frequency of feeling depressed separated \n by level of sedentary activity factored by age")+
theme_minimal()+
theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph9
graph10 <-ggplot(data=dat2, aes(x=depr2, fill = factor(agelevel))) + geom_bar(position="dodge") + facet_wrap(~minsPAlevel)+labs(x="Frequency of feeling depressed", y="Count", title="Figure 10: Frequency of feeling depressed seperated \n by level of physical activity factored by age")+
theme_minimal()+
theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph10
In both Figure 9 and Figure 10, age does not seem to play a role in the relationship between feelings of depression and levels of sedentary and physical activity. The only interesting relationship may exist when analyzing how age plays a role among those with intense sedentary activity however, there is not enough data to illustrate the relationship. Essentially, not enough participants reported intense sedentary behavior to make a proper conclusion.
graph11 <- ggplot(data=dat2, aes(x=sleep2, fill = factor(agelevel))) + geom_bar(position="dodge") + facet_wrap(~minsSedlevel)+labs(x="Quality of Sleep", y="Count", title="Figure 11: Quality of sleep separated \n by level of sedentary activity factored by age")+
theme_minimal()+
theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph11
graph12 <- ggplot(data=dat2, aes(x=sleep2, fill = factor(agelevel))) + geom_bar(position="dodge") + facet_wrap(~minsPAlevel)+labs(x="Quality of Sleep", y="Count", title="Figure 12: Quality of sleep separated \n by level of physical activity factored by age")+
theme_minimal()+
theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph12
Figure 11 is incredibly similar to Figure 9 in terms of the pattern among age groups depicted. As seen in Figure 9, Figure 11 also shows that age has little to no role in the relationship between sedentary activity and quality of sleep. In both Figure 11 and Figure 12, no statistically significant conclusion can be made from the group of participants who reported intense physical and sedentary activity when comparing them by age. In Figure 12, for both young and middle-aged participants, those who report moderate physical activity also report better sleep quality than those with low physical activity. However, for those in the oldest age group, there seems to be less of a difference.
phys1$PAD660 <- as.numeric(phys1$PAD660)
phys1$PAD680 <- as.numeric(phys1$PAD680)
graph13 <- ggplot(phys1, aes(x=PAD660, y=PAD680))+
geom_point(alpha=0.8)+
geom_smooth(method= "lm", se= FALSE)+
xlab("Minutes of Vigorous Phys Activity Each Day")+
ylab("Minutes Spent Sedentary Each Day")+
ggtitle("Figure 13: Time Spent Active Vs. Time Spent Sedentary")+
theme_minimal()
ggplotly(graph13)
Figure 13 shows the relationship between the amount of time people spend being active versus the amount of time they spend sedentary. As assumed, the more physical activity people perform, the less sedentary they are but not by a substantial amount.
Model 1 : A simple linear regression on time spent sedentary and time spent doing physical activity
\[minsPA= \beta_0 + \beta_1 minsSed\] Model 2: A simple linear regression of gender on time spent doing physical activity \[minsPA= \beta_0 + \beta_1 gender\] Model 3: A multi-linear regression of mental health variables (in binary form. 0 for “Not at all”, 1 for anything else) on Physical Activity \[minsPA= \beta_0 + \beta_1 depr + \beta_2 sleep + \beta_3 feelBad + \beta_4 concen\] Model 4: A multi-linear regression of mental health variables (in binary form. 0 for “Not at all”, 1 for anything else) on time spent sedentary \[minsSed= \beta_0 + \beta_1 depr + \beta_2 sleep + \beta_3 feelBad + \beta_4 concen\]
# Running a regression between mins spent doing vigorous physical activity and mins spent sedentary
dat2$minsPA <- as.numeric(dat2$minsPA)
dat2$minsSed <- as.numeric(dat2$minsSed)
m1 <- lm(minsPA~minsSed, data=dat2)
# Simple linear regression showing gender on minsPA and minsSed
dat2$minsPA <- as.numeric(dat2$minsPA)
m2 <- lm(minsPA ~ gender, data = dat2)
#MLR of mental health variables on phys activity
m3 <- lm(minsPA~depr3+sleep3+feelBad3+concen3, data=dat2)
m4 <- lm(minsSed~depr3+sleep3+feelBad3+concen3, data=dat2)
stargazer(list(m1, m2), type = 'html', title= "Simple Linear Regressions on Physical Acivity", align=TRUE,
covariate.labels = c("Mins Sedentary Activity", "Gender"),
dep.var.labels = "Mins Physical Activity",
column.labels = c("Model 1", "Model 2"))
| Dependent variable: | ||
| Mins Physical Activity | ||
| Model 1 | Model 2 | |
| (1) | (2) | |
| Mins Sedentary Activity | -0.018** | |
| (0.008) | ||
| Gender | -16.356*** | |
| (2.966) | ||
| Constant | 82.352*** | 83.379*** |
| (2.965) | (1.928) | |
| Observations | 1,271 | 1,271 |
| R2 | 0.004 | 0.023 |
| Adjusted R2 | 0.003 | 0.023 |
| Residual Std. Error (df = 1269) | 52.737 | 52.224 |
| F Statistic (df = 1; 1269) | 5.244** | 30.421*** |
| Note: | p<0.1; p<0.05; p<0.01 | |
stargazer(list(m3, m4), type = 'html', title= "Multiple Linear Regressions on Physical Acivity With Depression Variables", align=TRUE,
covariate.labels = c("Felt Depressed", "Poor/Too Much Sleep", "Felt Bad About Oneself","Trouble Concentrating" ),
dep.var.labels = c("Mins Physical Activity","Mins Sedentary"),
column.labels = c("Model 3", "Model 4"))
| Dependent variable: | ||
| Mins Physical Activity | Mins Sedentary | |
| Model 3 | Model 4 | |
| (1) | (2) | |
| Felt Depressed | 3.922 | -33.431* |
| (4.750) | (17.317) | |
| Poor/Too Much Sleep | 0.109 | 15.810 |
| (3.362) | (12.257) | |
| Felt Bad About Oneself | -1.711 | 30.534 |
| (5.407) | (19.712) | |
| Trouble Concentrating | -6.141 | 17.497 |
| (4.789) | (17.460) | |
| Constant | 76.765*** | 329.493*** |
| (1.883) | (6.867) | |
| Observations | 1,271 | 1,271 |
| R2 | 0.002 | 0.006 |
| Adjusted R2 | -0.001 | 0.002 |
| Residual Std. Error (df = 1266) | 52.864 | 192.728 |
| F Statistic (df = 4; 1266) | 0.529 | 1.762 |
| Note: | p<0.1; p<0.05; p<0.01 | |
Model 1 implies that a one-minute increase in the time spent sedentary correlates to a decrease in the time spent doing vigorous physical activity by 0.018 minutes. This is statistically significant at the 0.05 level. As for Model 2, which runs a regression on gender (a binary variable where 0 = female and 1 = male), a unit increase in gender (0 to 1) is correlated with a 16.356 unit decrease in minutes spent doing vigorous physical activity. This suggests being a female alone decreases time spent physically active by 16.356 minutes. This is statistically significant at the 0.01 level.
For our multi-linear regression model on Physical Activity, Model 3, we observe the following: A one-unit increase in reported depression (from “not at all” to any of the other included options) will have a 3.9 unit increase on time spent doing physical activity, trouble sleeping or sleeping too much will have 0.109 unit increase on time spent doing vigorous physical activity, having felt bad about oneself at any time over the 2 weeks prior to the screening resulted in a 1.711 unit decrease in time spent doing physical activity, and having had trouble concentrating at any point decreased physical activity by 6.141 units. However, none of these results are statistically significant at any level and therefore are inconclusive.
For our multi-linear regression model on Sedentary Activity, Model 4, we observe the following: A one-unit increase in reported depression (from “not at all” to any of the other included options) will have a 33.431 unit decrease on time spent sedentary and is statistically significant at the 0.1 level. Trouble sleeping or sleeping too much will have 15.810 unit increase on time spent sedentary and is not statistically significant at any level. Having felt bad about oneself at any time over the 2 weeks prior to the screening resulted in a 30.534 unit increase in time spent sedentary and is not statistically significant. Having had trouble concentrating at any point increased time spent sedentary by 17.497 units and was not statistically significant at any level.
The general findings from visualizing the data are as follows:
From our regressions, we conclude minutes spent sedentary each day and minutes spent doing vigorous activity each day were negatively related and statistically significant. This is expected as it is logical that more active people will spend less time sitting each day.
From our multi-linear regression models, we only find one statistically significant beta coefficient, which implies having felt depressed at any point within the 2 weeks prior to the screening decreases time spent sedentary each day by 33.431 units. This revelation defies our hypothesis. Thus, it cannot be claimed that increased time each day spent doing vigorous physical activity correlates to better mental health, and it can also not be claimed that more time spent sedentary will have a positive correlation with poor mental health.
Although the original data represent the national population and are recent enough to assume any relationships found between variables are still applicable, there are several limitations to the data.
The limitations are as follows:
These limitations could have significant effects that could skew our analysis, especially considering that some relationships were barely statistically significant or insignificant.
Other limitations include the lack of quantiative variables for meaningful regression analysis. Because only 2 of our variables were truly quantitative (minsPA and minsSed), many of our predictors were qualitative, which offer less interesting and promising insight.
One major discrepancy in the data to note is that the sample sizes among each category (“low”, “moderate”, “intense”) are not evenly distributed. Thus, comparing results among groups is not yielding as accurate of results as we would achieve with even, large samples in each category.